Overview

Dataset statistics

Number of variables33
Number of observations15407
Missing cells71345
Missing cells (%)14.0%
Duplicate rows2
Duplicate rows (%)< 0.1%
Total size in memory3.9 MiB
Average record size in memory264.0 B

Variable types

CAT22
NUM11

Reproduction

Analysis started2020-06-27 02:28:18.785287
Analysis finished2020-06-27 02:28:38.962047
Duration20.18 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 2 (< 0.1%) duplicate rows Duplicates
atty_firm_name has a high cardinality: 453 distinct values High cardinality
detail_cause has a high cardinality: 63 distinct values High cardinality
how_injury_occur has a high cardinality: 15344 distinct values High cardinality
injury_city has a high cardinality: 1340 distinct values High cardinality
injury_postal has a high cardinality: 1841 distinct values High cardinality
injury_state has a high cardinality: 53 distinct values High cardinality
detail_cause is highly correlated with causeHigh correlation
cause is highly correlated with detail_causeHigh correlation
osha_injury_type is highly correlated with nature_injuryHigh correlation
nature_injury is highly correlated with osha_injury_type and 1 other fieldsHigh correlation
type_loss is highly correlated with nature_injuryHigh correlation
ave_wkly_wage has 9535 (61.9%) missing values Missing
claimant_age has 2163 (14.0%) missing values Missing
atty_firm_name has 12410 (80.5%) missing values Missing
marital_status has 12693 (82.4%) missing values Missing
depart_code has 8123 (52.7%) missing values Missing
injury_postal has 3684 (23.9%) missing values Missing
#dependents has 14808 (96.1%) missing values Missing
severity_index has 344 (2.2%) missing values Missing
reforms_dummy has 6313 (41.0%) missing values Missing
length_employed has 901 (5.8%) missing values Missing
Dependent is highly skewed (γ1 = 27.90114494) Skewed
how_injury_occur is uniformly distributed Uniform
Dependent has 2689 (17.5%) zeros Zeros
time_injury has 2828 (18.4%) zeros Zeros
diff_carrier_employer has 3225 (20.9%) zeros Zeros
diff_employer_injury has 11196 (72.7%) zeros Zeros

Variables

Dependent
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count5024
Unique (%)32.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10285.41370805478
Minimum0.0
Maximum3774290.0
Zeros2689
Zeros (%)17.5%
Memory size120.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1152
median446
Q31703.5
95-th percentile51491.9
Maximum3774290
Range3774290
Interquartile range (IQR)1551.5

Descriptive statistics

Standard deviation54196.59903
Coefficient of variation (CV)5.269267778
Kurtosis1591.997702
Mean10285.41371
Median Absolute Deviation (MAD)446
Skewness27.90114494
Sum158467369
Variance2937271346
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0268917.5%
 
154420.3%
 
150320.2%
 
222280.2%
 
215270.2%
 
180270.2%
 
3240.2%
 
232240.2%
 
199230.1%
 
167230.1%
 
Other values (5014)1246880.9%
 
ValueCountFrequency (%) 
0268917.5%
 
12< 0.1%
 
22< 0.1%
 
3240.2%
 
490.1%
 
ValueCountFrequency (%) 
37742901< 0.1%
 
11596311< 0.1%
 
10865221< 0.1%
 
10830671< 0.1%
 
10768831< 0.1%
 

ave_wkly_wage
Real number (ℝ≥0)

MISSING

Distinct count1966
Unique (%)33.5%
Missing9535
Missing (%)61.9%
Infinite0
Infinite (%)0.0%
Mean1148.0342302452316
Minimum2.0
Maximum9999.0
Zeros0
Zeros (%)0.0%
Memory size120.4 KiB

Quantile statistics

Minimum2
5-th percentile150
Q1500
median1000
Q31529.25
95-th percentile2669
Maximum9999
Range9997
Interquartile range (IQR)1029.25

Descriptive statistics

Standard deviation920.9787874
Coefficient of variation (CV)0.8022224104
Kurtosis13.36293414
Mean1148.03423
Median Absolute Deviation (MAD)500
Skewness2.590677501
Sum6741257
Variance848201.9268
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5003832.5%
 
3202041.3%
 
10001801.2%
 
6001491.0%
 
1501260.8%
 
1001210.8%
 
400840.5%
 
1500700.5%
 
1200680.4%
 
300610.4%
 
Other values (1956)442628.7%
 
(Missing)953561.9%
 
ValueCountFrequency (%) 
23< 0.1%
 
32< 0.1%
 
52< 0.1%
 
72< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
99991< 0.1%
 
93161< 0.1%
 
92001< 0.1%
 
90001< 0.1%
 
89001< 0.1%
 

body_part
Categorical

Distinct count46
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
Finger(s)
 
1426
Low Back Area
 
1317
Knee
 
1236
Other Facial Soft Tissue
 
1113
Eye(s)
 
1028
Other values (41)
9287
ValueCountFrequency (%) 
Finger(s)14269.3%
 
Low Back Area13178.5%
 
Knee12368.0%
 
Other Facial Soft Tissue11137.2%
 
Eye(s)10286.7%
 
Ankle9095.9%
 
Hand8165.3%
 
Shoulder(s)7404.8%
 
Foot6174.0%
 
Lower Leg5683.7%
 
Other values (36)563736.6%
 

Length

Max length54
Median length6
Mean length10.31161161
Min length3

cause
Categorical

HIGH CORRELATION

Distinct count10
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
Strain or Injury By
4174
Fall, Slip or Trip Injury
2662
Struck or Injured By
2533
Miscellaneous Causes
2004
Cut, Puncture, Scrape Injured By
1878
Other values (5)
2156
ValueCountFrequency (%) 
Strain or Injury By417427.1%
 
Fall, Slip or Trip Injury266217.3%
 
Struck or Injured By253316.4%
 
Miscellaneous Causes200413.0%
 
Cut, Puncture, Scrape Injured By187812.2%
 
Striking Against or Stepping on10096.5%
 
Caught In, Under or Between5583.6%
 
Motor Vehicle3582.3%
 
Burn or Scald - Heat or Cold Exposure2111.4%
 
Rubbed or Abraded By200.1%
 

Length

Max length37
Median length20
Mean length23.09975985
Min length13

claimant_age
Real number (ℝ≥0)

MISSING

Distinct count84
Unique (%)0.6%
Missing2163
Missing (%)14.0%
Infinite0
Infinite (%)0.0%
Mean40.286922379945636
Minimum1.0
Maximum91.0
Zeros0
Zeros (%)0.0%
Memory size120.4 KiB

Quantile statistics

Minimum1
5-th percentile23
Q131
median40
Q349
95-th percentile60
Maximum91
Range90
Interquartile range (IQR)18

Descriptive statistics

Standard deviation11.82069754
Coefficient of variation (CV)0.2934127713
Kurtosis-0.5246541589
Mean40.28692238
Median Absolute Deviation (MAD)9
Skewness0.1788198528
Sum533560
Variance139.7288904
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
414362.8%
 
433822.5%
 
363782.5%
 
473772.4%
 
463762.4%
 
453692.4%
 
403692.4%
 
393662.4%
 
343642.4%
 
243612.3%
 
Other values (74)946661.4%
 
(Missing)216314.0%
 
ValueCountFrequency (%) 
16< 0.1%
 
22< 0.1%
 
32< 0.1%
 
42< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
911< 0.1%
 
891< 0.1%
 
851< 0.1%
 
841< 0.1%
 
832< 0.1%
 

atty_firm_name
Categorical

HIGH CARDINALITY
MISSING

Distinct count453
Unique (%)15.1%
Missing12410
Missing (%)80.5%
Memory size120.4 KiB
TLEVY, STERN & FORD
 
116
TJ. LEEDS BARROL, IV ATTORNEY AT LAW
 
58
TLEVY, FORD & WALLACH
 
51
TLEVY, STERN, & FORD
 
50
TCARUSO, SPILLANE, LEIGHTON,CONTRASTANO, ULANER & SAVINO
 
37
Other values (448)
2685
ValueCountFrequency (%) 
TLEVY, STERN & FORD1160.8%
 
TJ. LEEDS BARROL, IV ATTORNEY AT LAW580.4%
 
TLEVY, FORD & WALLACH510.3%
 
TLEVY, STERN, & FORD500.3%
 
TCARUSO, SPILLANE, LEIGHTON,CONTRASTANO, ULANER & SAVINO370.2%
 
TLAW OFFICES OF MCNAMARA &360.2%
 
TMARDER, ESKESEN & NASS350.2%
 
TKLEIN WAGNER & MORRIS340.2%
 
TKLEE & WOOLF, LLP330.2%
 
TLAW OFFICE OF CHRISTINE T. NELSON330.2%
 
Other values (443)251416.3%
 
(Missing)1241080.5%
 

Length

Max length57
Median length3
Mean length7.532809762
Min length3

gender
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
Male
12375
Female
2950
Uknown
 
82
ValueCountFrequency (%) 
Male1237580.3%
 
Female295019.1%
 
Uknown820.5%
 

Length

Max length6
Median length4
Mean length4.39358733
Min length4

marital_status
Categorical

MISSING

Distinct count3
Unique (%)0.1%
Missing12693
Missing (%)82.4%
Memory size120.4 KiB
Unmarried, Single, Widowed, Divorced
1435
Married
1258
Separated
 
21
ValueCountFrequency (%) 
Unmarried, Single, Widowed, Divorced14359.3%
 
Married12588.2%
 
Separated210.1%
 
(Missing)1269382.4%
 

Length

Max length36
Median length3
Mean length6.408385799
Min length3

claim_st
Categorical

Distinct count49
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
California
8912
New York
 
1121
New Mexico
 
610
Georgia
 
532
Texas
 
517
Other values (44)
3715
ValueCountFrequency (%) 
California891257.8%
 
New York11217.3%
 
New Mexico6104.0%
 
Georgia5323.5%
 
Texas5173.4%
 
North Carolina4993.2%
 
Louisiana4372.8%
 
New Jersey3412.2%
 
Florida2361.5%
 
Illinois2241.5%
 
Other values (39)197812.8%
 

Length

Max length25
Median length10
Mean length9.481469462
Min length4

depart_code
Real number (ℝ≥0)

MISSING

Distinct count23
Unique (%)0.3%
Missing8123
Missing (%)52.7%
Infinite0
Infinite (%)0.0%
Mean12.119989017023613
Minimum1.0
Maximum23.0
Zeros0
Zeros (%)0.0%
Memory size120.4 KiB

Quantile statistics

Minimum1
5-th percentile2
Q16
median14
Q318
95-th percentile21
Maximum23
Range22
Interquartile range (IQR)12

Descriptive statistics

Standard deviation7.014638483
Coefficient of variation (CV)0.5787660758
Kurtosis-1.477812356
Mean12.11998902
Median Absolute Deviation (MAD)6
Skewness-0.1316306344
Sum88282
Variance49.20515304
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2111647.6%
 
179776.3%
 
88565.6%
 
36934.5%
 
65763.7%
 
25313.4%
 
144673.0%
 
184092.7%
 
113392.2%
 
12321.5%
 
Other values (13)10406.8%
 
(Missing)812352.7%
 
ValueCountFrequency (%) 
12321.5%
 
25313.4%
 
36934.5%
 
4970.6%
 
5880.6%
 
ValueCountFrequency (%) 
23520.3%
 
221210.8%
 
2111647.6%
 
201781.2%
 
191420.9%
 

detail_cause
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct count63
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
Strain/Injury by Misc
 
1577
Strain/Injury by Lifting
 
1193
Struck by Falling/Flying Object
 
916
Misc, Other
 
838
Fall/Slip, Same Level
 
795
Other values (58)
10088
ValueCountFrequency (%) 
Strain/Injury by Misc157710.2%
 
Strain/Injury by Lifting11937.7%
 
Struck by Falling/Flying Object9165.9%
 
Misc, Other8385.4%
 
Fall/Slip, Same Level7955.2%
 
Misc, Foreign Body in Eye7785.0%
 
Cut/Puncture/Scrape, Object Lift/Handled7514.9%
 
Strike/Step On, Fixed Object7114.6%
 
Fall/Slip, Misc6314.1%
 
Fall/Slip, Different Level5453.5%
 
Other values (53)667243.3%
 

Length

Max length40
Median length25
Mean length25.54812747
Min length9

domestic_foreign
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
Domestic
15190
Foreign
 
217
ValueCountFrequency (%) 
Domestic1519098.6%
 
Foreign2171.4%
 

Length

Max length8
Median length8
Mean length7.985915493
Min length7

employ_status
Categorical

Distinct count12
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
Unknown/Other
12564
Full-Time
 
2440
Seasonal
 
190
Part-Time
 
97
Piece Worker
 
66
Other values (7)
 
50
ValueCountFrequency (%) 
Unknown/Other1256481.5%
 
Full-Time244015.8%
 
Seasonal1901.2%
 
Part-Time970.6%
 
Piece Worker660.4%
 
On Strike180.1%
 
Apprenticeship Full-Time140.1%
 
Disabled90.1%
 
Retired3< 0.1%
 
Not Employed2< 0.1%
 
Other values (2)4< 0.1%
 

Length

Max length24
Median length13
Mean length12.27740637
Min length7

handling_office
Categorical

Distinct count28
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
LOS ANGELE
7112
SACRAMENTO
1884
DALLAS WC
 
1345
NEW JERSEY
 
861
LONG ISLAN
 
534
Other values (23)
3671
ValueCountFrequency (%) 
LOS ANGELE711246.2%
 
SACRAMENTO188412.2%
 
DALLAS WC13458.7%
 
NEW JERSEY8615.6%
 
LONG ISLAN5343.5%
 
WC SOUTHEA5223.4%
 
CHARLOTTE5153.3%
 
IN-STATE A3532.3%
 
ATLANTA2971.9%
 
ILLINOIS2811.8%
 
Other values (18)170311.1%
 

Length

Max length10
Median length10
Mean length9.69942234
Min length6

how_injury_occur
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count15344
Unique (%)99.6%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
WORKING IN WOODED TALL GRASS AREA - EE WAS BITTEN BY DEER TI
 
17
UNKNOWN
 
7
EMPLOYEE WAS EXPOSED TO HEPATITIS A VIRUS WHILE WORKING ON S
 
7
WHILE WORKING IN A WOODED - TALL GRASS AREA - EE WAS BITTEN
 
7
EE WAS EXPOSED TO HEPATITIS A VIRUS WHILE WORKING ON SET.
 
5
Other values (15339)
15364
ValueCountFrequency (%) 
WORKING IN WOODED TALL GRASS AREA - EE WAS BITTEN BY DEER TI170.1%
 
UNKNOWN7< 0.1%
 
EMPLOYEE WAS EXPOSED TO HEPATITIS A VIRUS WHILE WORKING ON S7< 0.1%
 
WHILE WORKING IN A WOODED - TALL GRASS AREA - EE WAS BITTEN7< 0.1%
 
EE WAS EXPOSED TO HEPATITIS A VIRUS WHILE WORKING ON SET.5< 0.1%
 
EE WAS EXPOSED TO HEPATITIS A VIRUS WHILE WORKING ON SET4< 0.1%
 
EE WAS EXPOSED TO HEPATITIS A VIRUS WHILE WORKING ON SET - P3< 0.1%
 
EE INHALED SMOKE FROM A FIRE THAT STARTED WHEN A HOT WIRE WA2< 0.1%
 
EMPLOYEE WAS WORKING ON SET WHEN HE EXPERIENCED SHORTNESS OF2< 0.1%
 
WHILE WORKING IN WOODED - TALL GRASS AREA - EMPLOYEE WAS BIT2< 0.1%
 
Other values (15334)1535199.6%
 

Length

Max length60
Median length60
Mean length57.90601675
Min length7

injury_city
Categorical

HIGH CARDINALITY

Distinct count1340
Unique (%)8.7%
Missing1
Missing (%)< 0.1%
Memory size120.4 KiB
LOS ANGELES
2294
UNKNOWN
 
1538
BURBANK
 
1506
NEW ORLEANS
 
610
NEW YORK
 
393
Other values (1335)
9065
ValueCountFrequency (%) 
LOS ANGELES229414.9%
 
UNKNOWN153810.0%
 
BURBANK15069.8%
 
NEW ORLEANS6104.0%
 
NEW YORK3932.6%
 
BROOKLYN3882.5%
 
WILMINGTON3282.1%
 
CULVER CITY2981.9%
 
AUSTIN2611.7%
 
ATLANTA2221.4%
 
Other values (1330)756849.1%
 

Length

Max length19
Median length8
Mean length9.119101707
Min length1

injury_postal
Categorical

HIGH CARDINALITY
MISSING

Distinct count1841
Unique (%)15.7%
Missing3684
Missing (%)23.9%
Memory size120.4 KiB
91502
 
1369
95816
 
406
90038
 
303
90001
 
222
90028
 
219
Other values (1836)
9204
ValueCountFrequency (%) 
9150213698.9%
 
958164062.6%
 
900383032.0%
 
900012221.4%
 
900282191.4%
 
915051991.3%
 
902321711.1%
 
915041671.1%
 
916081581.0%
 
915211270.8%
 
Other values (1831)838254.4%
 
(Missing)368423.9%
 

Length

Max length9
Median length5
Mean length4.513467904
Min length3

injury_state
Categorical

HIGH CARDINALITY

Distinct count53
Unique (%)0.3%
Missing12
Missing (%)0.1%
Memory size120.4 KiB
California
8019
New York
 
1356
Louisiana
 
1132
Georgia
 
654
North Carolina
 
583
Other values (48)
3651
ValueCountFrequency (%) 
California801952.0%
 
New York13568.8%
 
Louisiana11327.3%
 
Georgia6544.2%
 
North Carolina5833.8%
 
Texas4032.6%
 
New Mexico3832.5%
 
Pennsylvania3112.0%
 
Michigan2741.8%
 
Utah2631.7%
 
Other values (43)201713.1%
 

Length

Max length25
Median length10
Mean length9.506003765
Min length3

jurisdiction
Categorical

Distinct count47
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
California
9094
New York
 
1525
Louisiana
 
851
Georgia
 
565
North Carolina
 
535
Other values (42)
2837
ValueCountFrequency (%) 
California909459.0%
 
New York15259.9%
 
Louisiana8515.5%
 
Georgia5653.7%
 
North Carolina5353.5%
 
Texas3532.3%
 
New Mexico3322.2%
 
Pennsylvania2441.6%
 
Illinois2411.6%
 
Utah2361.5%
 
Other values (37)14319.3%
 

Length

Max length20
Median length10
Mean length9.506523009
Min length4
Distinct count2
Unique (%)< 0.1%
Missing11
Missing (%)0.1%
Memory size120.4 KiB
Medical Only
11684
Lost Time
3712
ValueCountFrequency (%) 
Medical Only1168475.8%
 
Lost Time371224.1%
 
(Missing)110.1%
 

Length

Max length12
Median length12
Mean length11.27078601
Min length3

nature_injury
Categorical

HIGH CORRELATION

Distinct count45
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
Strain
3583
Laceration
2418
Specific Injury - All Other
2221
Contusion
1610
Sprain
1025
Other values (40)
4550
ValueCountFrequency (%) 
Strain358323.3%
 
Laceration241815.7%
 
Specific Injury - All Other222114.4%
 
Contusion161010.4%
 
Sprain10256.7%
 
Puncture7855.1%
 
Foreign Body7594.9%
 
Inflammation6704.3%
 
Fracture5703.7%
 
Infection2001.3%
 
Other values (35)156610.2%
 

Length

Max length59
Median length9
Mean length11.81729084
Min length4

#dependents
Real number (ℝ≥0)

MISSING

Distinct count9
Unique (%)1.5%
Missing14808
Missing (%)96.1%
Infinite0
Infinite (%)0.0%
Mean1.9131886477462436
Minimum1.0
Maximum18.0
Zeros0
Zeros (%)0.0%
Memory size120.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q32
95-th percentile4
Maximum18
Range17
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.334821653
Coefficient of variation (CV)0.6976947384
Kurtosis38.79484103
Mean1.913188648
Median Absolute Deviation (MAD)1
Skewness4.345641318
Sum1146
Variance1.781748846
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12811.8%
 
21871.2%
 
3840.5%
 
4290.2%
 
580.1%
 
64< 0.1%
 
93< 0.1%
 
72< 0.1%
 
181< 0.1%
 
(Missing)1480896.1%
 
ValueCountFrequency (%) 
12811.8%
 
21871.2%
 
3840.5%
 
4290.2%
 
580.1%
 
ValueCountFrequency (%) 
181< 0.1%
 
93< 0.1%
 
72< 0.1%
 
64< 0.1%
 
580.1%
 

osha_injury_type
Categorical

HIGH CORRELATION

Distinct count6
Unique (%)< 0.1%
Missing2
Missing (%)< 0.1%
Memory size120.4 KiB
Injury
15128
Skin disorder
 
156
Respiratory condition
 
60
Poisoning
 
27
Hearing loss
 
26
ValueCountFrequency (%) 
Injury1512898.2%
 
Skin disorder1561.0%
 
Respiratory condition600.4%
 
Poisoning270.2%
 
Hearing loss260.2%
 
All other illnesses80.1%
 
(Missing)2< 0.1%
 

Length

Max length21
Median length6
Mean length6.151035244
Min length3

severity_index
Categorical

MISSING

Distinct count10
Unique (%)0.1%
Missing344
Missing (%)2.2%
Memory size120.4 KiB
No Serious Injury Indicated
15000
Fatality
 
19
Fractured Bone(s)
 
15
Back Injury involving Surgery/Extended Disability
 
10
Involves AIDS, Herpes, TSS, Cancer, Other Diseases
 
5
Other values (5)
 
14
ValueCountFrequency (%) 
No Serious Injury Indicated1500097.4%
 
Fatality190.1%
 
Fractured Bone(s)150.1%
 
Back Injury involving Surgery/Extended Disability100.1%
 
Involves AIDS, Herpes, TSS, Cancer, Other Diseases5< 0.1%
 
Serious Head Injury4< 0.1%
 
Heart Attack or Cardio-Vascular Accident4< 0.1%
 
Minor Amputation2< 0.1%
 
Serious Burns2< 0.1%
 
Major Amputation2< 0.1%
 
(Missing)3442.2%
 

Length

Max length50
Median length27
Mean length26.44934121
Min length3

time_injury
Real number (ℝ≥0)

ZEROS

Distinct count581
Unique (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010.9847471928344
Minimum0
Maximum2359
Zeros2828
Zeros (%)18.4%
Memory size120.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1615
median1035
Q31510
95-th percentile2025
Maximum2359
Range2359
Interquartile range (IQR)895

Descriptive statistics

Standard deviation660.7386831
Coefficient of variation (CV)0.6535594973
Kurtosis-0.9523935227
Mean1010.984747
Median Absolute Deviation (MAD)465
Skewness-0.1933581149
Sum15576242
Variance436575.6074
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0282818.4%
 
10006614.3%
 
14005383.5%
 
11005303.4%
 
15004703.1%
 
9004072.6%
 
16004012.6%
 
13003862.5%
 
8003842.5%
 
17003001.9%
 
Other values (571)850255.2%
 
ValueCountFrequency (%) 
0282818.4%
 
12< 0.1%
 
21< 0.1%
 
41< 0.1%
 
5150.1%
 
ValueCountFrequency (%) 
23592< 0.1%
 
23557< 0.1%
 
23531< 0.1%
 
23507< 0.1%
 
23481< 0.1%
 

type_loss
Categorical

HIGH CORRELATION

Distinct count2
Unique (%)< 0.1%
Missing53
Missing (%)0.3%
Memory size120.4 KiB
Specific Injury
15195
Cumulative Trauma
 
159
ValueCountFrequency (%) 
Specific Injury1519598.6%
 
Cumulative Trauma1591.0%
 
(Missing)530.3%
 

Length

Max length17
Median length15
Mean length14.97936003
Min length3

policy_yr
Real number (ℝ≥0)

Distinct count15
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2007.6086194586876
Minimum2000
Maximum2014
Zeros0
Zeros (%)0.0%
Memory size120.4 KiB

Quantile statistics

Minimum2000
5-th percentile2000
Q12004
median2008
Q32011
95-th percentile2014
Maximum2014
Range14
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.268828711
Coefficient of variation (CV)0.002126325156
Kurtosis-1.203913341
Mean2007.608619
Median Absolute Deviation (MAD)4
Skewness-0.1645031735
Sum30931226
Variance18.22289857
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
201114409.3%
 
201313258.6%
 
201211967.8%
 
200511417.4%
 
201411237.3%
 
201010957.1%
 
200410296.7%
 
20089696.3%
 
20029506.2%
 
20099326.0%
 
Other values (5)420727.3%
 
ValueCountFrequency (%) 
20007785.0%
 
20016974.5%
 
20029506.2%
 
20039085.9%
 
200410296.7%
 
ValueCountFrequency (%) 
201411237.3%
 
201313258.6%
 
201211967.8%
 
201114409.3%
 
201010957.1%
 

reforms_dummy
Categorical

MISSING

Distinct count3
Unique (%)< 0.1%
Missing6313
Missing (%)41.0%
Memory size120.4 KiB
California Refom 1
4983
California Refom 0
3022
California Reform 2
1089
ValueCountFrequency (%) 
California Refom 1498332.3%
 
California Refom 0302219.6%
 
California Reform 210897.1%
 
(Missing)631341.0%
 

Length

Max length19
Median length18
Mean length11.92444993
Min length3

length_employed
Real number (ℝ≥0)

MISSING

Distinct count46
Unique (%)0.3%
Missing901
Missing (%)5.8%
Infinite0
Infinite (%)0.0%
Mean7.692954639459534
Minimum1.0
Maximum60.0
Zeros0
Zeros (%)0.0%
Memory size120.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q311
95-th percentile15
Maximum60
Range59
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.677775864
Coefficient of variation (CV)0.6080597226
Kurtosis6.983011565
Mean7.692954639
Median Absolute Deviation (MAD)4
Skewness1.152591289
Sum111594
Variance21.88158704
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
413528.8%
 
212398.0%
 
311277.3%
 
1110827.0%
 
510616.9%
 
1010486.8%
 
79336.1%
 
139206.0%
 
129055.9%
 
68835.7%
 
Other values (36)395625.7%
 
(Missing)9015.8%
 
ValueCountFrequency (%) 
18695.6%
 
212398.0%
 
311277.3%
 
413528.8%
 
510616.9%
 
ValueCountFrequency (%) 
601< 0.1%
 
551< 0.1%
 
541< 0.1%
 
521< 0.1%
 
513< 0.1%
 

diff_carrier_employer
Real number (ℝ)

ZEROS

Distinct count247
Unique (%)1.6%
Missing146
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean7.929559006618177
Minimum-1094.0
Maximum1994.0
Zeros3225
Zeros (%)20.9%
Memory size120.4 KiB

Quantile statistics

Minimum-1094
5-th percentile0
Q11
median2
Q35
95-th percentile27
Maximum1994
Range3088
Interquartile range (IQR)4

Descriptive statistics

Standard deviation44.78888828
Coefficient of variation (CV)5.648345418
Kurtosis656.5273028
Mean7.929559007
Median Absolute Deviation (MAD)2
Skewness18.61863324
Sum121013
Variance2006.044513
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1379224.6%
 
0322520.9%
 
2157910.2%
 
313799.0%
 
410496.8%
 
57114.6%
 
65763.7%
 
74522.9%
 
82881.9%
 
91671.1%
 
Other values (237)204313.3%
 
(Missing)1460.9%
 
ValueCountFrequency (%) 
-10941< 0.1%
 
-6931< 0.1%
 
-3631< 0.1%
 
-3621< 0.1%
 
-3461< 0.1%
 
ValueCountFrequency (%) 
19941< 0.1%
 
17951< 0.1%
 
12451< 0.1%
 
12131< 0.1%
 
11412< 0.1%
 

diff_employer_injury
Real number (ℝ)

ZEROS

Distinct count415
Unique (%)2.7%
Missing146
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean19.966253849682197
Minimum-31.0
Maximum4200.0
Zeros11196
Zeros (%)72.7%
Memory size120.4 KiB

Quantile statistics

Minimum-31
5-th percentile0
Q10
median0
Q31
95-th percentile20
Maximum4200
Range4231
Interquartile range (IQR)1

Descriptive statistics

Standard deviation163.6920873
Coefficient of variation (CV)8.198437651
Kurtosis206.7916229
Mean19.96625385
Median Absolute Deviation (MAD)0
Skewness12.992533
Sum304705
Variance26795.09945
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01119672.7%
 
113819.0%
 
24382.8%
 
33562.3%
 
42451.6%
 
51631.1%
 
71260.8%
 
61190.8%
 
8640.4%
 
9600.4%
 
Other values (405)11137.2%
 
(Missing)1460.9%
 
ValueCountFrequency (%) 
-311< 0.1%
 
-101< 0.1%
 
-51< 0.1%
 
-31< 0.1%
 
-21< 0.1%
 
ValueCountFrequency (%) 
42001< 0.1%
 
38891< 0.1%
 
37651< 0.1%
 
35241< 0.1%
 
33341< 0.1%
 

shift
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size120.4 KiB
2nd
7084
1st
6084
3rd
2239
ValueCountFrequency (%) 
2nd708446.0%
 
1st608439.5%
 
3rd223914.5%
 

Length

Max length3
Median length3
Mean length3
Min length3

length_how_injury
Real number (ℝ≥0)

Distinct count51
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.9060167456351
Minimum7
Maximum60
Zeros0
Zeros (%)0.0%
Memory size120.4 KiB

Quantile statistics

Minimum7
5-th percentile45
Q159
median60
Q360
95-th percentile60
Maximum60
Range53
Interquartile range (IQR)1

Descriptive statistics

Standard deviation5.702373679
Coefficient of variation (CV)0.09847635875
Kurtosis17.49582622
Mean57.90601675
Median Absolute Deviation (MAD)0
Skewness-3.880200295
Sum892158
Variance32.51706557
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
601035767.2%
 
59252816.4%
 
582381.5%
 
572251.5%
 
561671.1%
 
551631.1%
 
541370.9%
 
531350.9%
 
511330.9%
 
491140.7%
 
Other values (41)12107.9%
 
ValueCountFrequency (%) 
77< 0.1%
 
81< 0.1%
 
101< 0.1%
 
111< 0.1%
 
122< 0.1%
 
ValueCountFrequency (%) 
601035767.2%
 
59252816.4%
 
582381.5%
 
572251.5%
 
561671.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Dependentave_wkly_wagebody_partcauseclaimant_ageatty_firm_namegendermarital_statusclaim_stdepart_codedetail_causedomestic_foreignemploy_statushandling_officehow_injury_occurinjury_cityinjury_postalinjury_statejurisdictionlost_time_or_medicalonlynature_injury#dependentsosha_injury_typeseverity_indextime_injurytype_losspolicy_yrreforms_dummylength_employeddiff_carrier_employerdiff_employer_injuryshiftlength_how_injury
098679.0500.0PelvisStruck or Injured By21.0NaNFemaleNaNCaliforniaNaNStruck by Falling/Flying ObjectDomesticFull-TimeLOS ANGELEGOING DOWN SKI HILL ON AN AIR MATTRESS AS PART OF A SKIT,PORTLAND97201OregonCaliforniaLost TimeFractureNaNInjuryFractured Bone(s)1430Specific Injury2001California Refom 014.04.00.02nd57
155727.01037.0Low Back AreaStrain or Injury ByNaNNaNMaleNaNCaliforniaNaNStrain/Injury by LiftingDomesticFull-TimeIN-STATE AREPETITIVE LIFTING OF CABLE, EE STRAINED LOWER BACKPEARL HARBORNaNHawaiiHawaiiLost TimeStrain1.0InjuryNo Serious Injury Indicated0Specific Injury2001NaN14.01.02.01st51
2185833.0929.0Low Back AreaStrain or Injury By63.0TROBINSON & CHUR ATTORNEYS AT LAWMaleMarriedHawaiiNaNStrain/Injury by CarryingDomesticUnknown/OtherIN-STATE ASTRAINED LOW BACK MOVING A PALM TREE WITH CO-WORKERPEARL HARBOR96860HawaiiHawaiiLost TimeStrainNaNInjuryNo Serious Injury Indicated0Specific Injury2001NaN14.09.00.01st51
398615.01226.0Multiple Body PartsStrain or Injury By49.0IBARRY STEVENS;;M;;MaleNaNIdahoNaNStrain/Injury by Repetitive MotionDomesticUnknown/OtherLOS ANGELEEE CLAIMS: CT 1973 -/02 TO BOTH SHOULDERS - SPINE - AND EARBURBANK91502CaliforniaCaliforniaLost TimeSpecific Injury - All OtherNaNInjuryNo Serious Injury Indicated0Specific Injury2002California Refom 0NaN8.0632.01st60
451396.0NaNOther Facial Soft TissueMiscellaneous Causes51.0IBARRY STEVENS;;M;;MaleNaNIdahoNaNMisc, No Physical CauseDomesticUnknown/OtherLOS ANGELEEMPLOYEE CLAIMS: CT 11/16/93 -8/1/03; PHYSICAL STRESS AND STBURBANK91502CaliforniaCaliforniaLost TimeSpecific Injury - All OtherNaNInjuryNo Serious Injury Indicated0Specific Injury2003California Refom 0NaN1.0200.01st60
54079.0NaNMultiple Upper ExtremitiesStrain or Injury By55.0IWAX & WAX;;A;;MaleNaNCaliforniaNaNStrain/Injury by Repetitive MotionDomesticUnknown/OtherLOS ANGELEEE CLAIMS: CT 1980 -NOV 2003 TO NECK - BACK - BOTH UPPER EXTBURBANK91502CaliforniaCaliforniaLost TimeCarpal Tunnel SyndromeNaNInjuryNo Serious Injury Indicated0Cumulative Trauma2002California Refom 0NaN0.0444.01st60
61909.01129.0Low Back AreaStrain or Injury By49.0TCRAIG RICHLINMaleNaNCaliforniaNaNStrain/Injury by MiscDomesticUnknown/OtherLOS ANGELEEE CLAIMS CT 09/25/00 09/25/01 TO LEFT LOWER EXTREMITY AND BUNKNOWNNaNCaliforniaCaliforniaLost TimeSpecific Injury - All OtherNaNInjuryNo Serious Injury Indicated0Specific Injury2000California Refom 015.03.01254.01st60
76687.0NaNLow Back AreaStrain or Injury By36.0NaNMaleNaNCaliforniaNaNStrain/Injury by Repetitive MotionDomesticUnknown/OtherLOS ANGELEEE CLAIMS REPETITIVE BACK INJURY CT 8/1/99-11/03 FROM REPETIMISSION HILLS91345CaliforniaCaliforniaLost TimeStrainNaNInjuryNo Serious Injury Indicated0Specific Injury2001California Refom 0NaN10.01223.01st60
85352.0NaNLow Back AreaStrain or Injury By45.0NaNMaleNaNCaliforniaNaNStrain/Injury by Repetitive MotionDomesticUnknown/OtherLOS ANGELEEE CLAIMS; CT 4/23/02-4/23/03 TO SPINE. BILATERAL UPPER EXTRBURBANK91504CaliforniaCaliforniaLost TimeStrainNaNInjuryNo Serious Injury Indicated0Specific Injury2003California Refom 0NaN6.0436.01st60
96324.01000.0Lower LegStrain or Injury By45.0NaNMaleNaNCaliforniaNaNStrain/Injury by Repetitive MotionDomesticFull-TimeLOS ANGELEEE CLAIMS: L LEG - VENUS THROMBOSIS WHILE WORKING IN LAS VEGLAS VEGAS89104NevadaCaliforniaLost TimeStrainNaNInjuryNo Serious Injury Indicated0Specific Injury2002California Refom 013.02.0642.01st60

Last rows

Dependentave_wkly_wagebody_partcauseclaimant_ageatty_firm_namegendermarital_statusclaim_stdepart_codedetail_causedomestic_foreignemploy_statushandling_officehow_injury_occurinjury_cityinjury_postalinjury_statejurisdictionlost_time_or_medicalonlynature_injury#dependentsosha_injury_typeseverity_indextime_injurytype_losspolicy_yrreforms_dummylength_employeddiff_carrier_employerdiff_employer_injuryshiftlength_how_injury
15397264.0NaNAbdomen including GroinStrain or Injury By38.0NaNMaleNaNVirginia18.0Strain/Injury by LiftingDomesticUnknown/OtherWC SOUTHEAPATIENT WAS LIFTING LENS CASES, STATES SOMETHING DID NOT FEERICHMOND23222VirginiaVirginiaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated830Specific Injury2014NaN1.00.00.01st60
153981034.0NaNBrainStruck or Injured By48.0NaNMaleNaNFlorida20.0Struck by Motor VehicleDomesticUnknown/OtherWC SOUTHEAEE WAS PERFORMING A STUNT WHEN HE GOT HIT BY A PICTURE CAR ASTONE MOUNTAIN30087GeorgiaFloridaMedical OnlyConcussionNaNInjuryNo Serious Injury Indicated2230Specific Injury2014NaN1.0NaNNaN3rd60
15399926.0NaNLower LegCut, Puncture, Scrape Injured By60.0NaNMaleNaNArizona19.0Cut/Puncture/Scrape, Object Lift/HandledDomesticUnknown/OtherWC SOUTHEAEE STATES WHILE MOVING STEEL HE LACERATED HIS R UPPER ASPECTFAYETTEVILLE30214GeorgiaGeorgiaMedical OnlyInfectionNaNInjuryNo Serious Injury Indicated1100Specific Injury2014NaN1.013.05.02nd60
15400780.0NaNShoulder(s)Strain or Injury By39.0NaNFemaleNaNGeorgia8.0Strain/Injury by CarryingDomesticUnknown/OtherWC SOUTHEAWHILE LOADING THE TRAILER EE FELT DISCOMFORT IN R SHOULDER.SENOIA30276GeorgiaGeorgiaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated1818Specific Injury2014NaN1.02.00.03rd59
154010.0NaNHandCut, Puncture, Scrape Injured By44.0NaNMaleNaNNorth Carolina8.0Cut/Puncture/Scrape, Hand ToolDomesticUnknown/OtherWC SOUTHEAEMPLOYEE WAS CUTTING ZIP TIES WITH A KNIFE WHEN IT SLIPPED OWILMINGTON28401North CarolinaNorth CarolinaMedical OnlyLacerationNaNInjuryNo Serious Injury Indicated830Specific Injury2014NaN1.00.00.01st60
154022405.0NaNKneeFall, Slip or Trip Injury21.0NaNFemaleNaNGeorgia6.0Fall/Slip, Into OpeningDomesticUnknown/OtherWC SOUTHEAEMPLOYEE WAS RETRIEVING PAINT SUPPLIES FROM A TRAILER WHEN SFAYETTEVILLE30214GeorgiaGeorgiaMedical OnlyInflammationNaNInjuryNo Serious Injury Indicated1115Specific Injury2014NaN1.00.00.02nd60
154031807.06486.0Shoulder(s)Strain or Injury By33.0NaNMaleNaNGeorgia20.0Strain/Injury by MiscDomesticFull-TimeWC SOUTHEAPATIENT WAS IN A SCENE THAT REQUIRED HIM TO RUN, STOP AND FAPEACHTREE CITYNaNGeorgiaGeorgiaLost TimeStrainNaNInjuryNo Serious Injury Indicated1558Specific Injury2014NaN1.00.00.02nd60
154040.0NaNAnkleStrain or Injury By33.0NaNMaleUnmarried, Single, Widowed, DivorcedVirginia22.0Strain/Injury by JumpingDomesticUnknown/OtherWC SOUTHEAPATIENT JUMPED OVER A 4' WOODEN FENCE, LANDED ON UNEVEN GROUHENRICO23238VirginiaVirginiaMedical OnlySprainNaNInjuryNo Serious Injury Indicated1900Specific Injury2014NaN1.01.00.03rd60
15405507.0NaNArmStrain or Injury By34.0NaNMaleNaNGeorgia3.0Strain/Injury by MiscDomesticUnknown/OtherWC SOUTHEAPATIENT STATED HE WAS INJURED WHEN PLASTERING WALLS. PRODUCTHIRAM30141GeorgiaGeorgiaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated1041Specific Injury2014NaN1.01.00.02nd60
154060.0NaNLow Back AreaStrain or Injury By49.0NaNFemaleNaNMissouri11.0Strain/Injury by TwistingDomesticUnknown/OtherWC SOUTHEAWHILE PERFORMING REQUIRED JOB DUTIES, EE TWISTED HER BACK, ANASHVILLE37214TennesseeSouth CarolinaMedical OnlyStrainNaNInjuryNo Serious Injury Indicated1930Specific Injury2014NaN1.00.03.03rd60